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Abstract 

The  estimation  of  3D  Euclidean  coordinates  of  features 
from  2D  images  is  a  problem  of  significant  interest.  In 
this  paper  we  develop  a  3D  Euclidean  position  estima¬ 
tion  strategy  for  a  static  object  using  a  single  moving 
camera  whose  motion  is  known.  The  Euclidean  depth 
estimator  which  is  developed  has  a  very  simple  mathe¬ 
matical  structure  and  is  easy  to  implement.  Numerical 
simulations  and  preliminary  experimental  results  using  a 
mobile  robot  in  an  indoor  environment  are  presented  to 
illustrate  the  performance  of  the  algorithm.  An  extension 
of  this  estimation  technique  for  a  paracatadioptric  system 
is  also  presented. 

1  Introduction 

The  use  of  a  camera  to  estimate  the  3D  structure  of  an  ob¬ 
ject  from  2D  images  is  known  as  “Structure  from  Motion 
(SFM)”  [1,  2,  3],  or  “Simultaneous  Localization  and  Map¬ 
ping  (SLAM)”  (see  [4,  5],  and  references  therein).  The 
problem  usually  involves  a  camera  mounted  on  a  moving 
platform,  such  as  an  unmanned  aerial  vehicle  (UAV)  or 
a  mobile  robot,  which  is  utilized  to  map  the  Euclidean 
position  of  static  landmarks  or  visual  features  in  the  en¬ 
vironment.  Recently,  SLAM  and  SFM  have  been  utilized 
for  a  number  of  applications  including  aerial  tracking  and 
surveillance  of  ground  based,  stationary  or  moving  ob¬ 
jects  [6,  7,  8,  9],  and  terrain  mapping  systems  [10,  11,  12], 

Most  of  the  previous  results  in  this  area  are  formu¬ 
lated  using  linearization  based  techniques  such  as  the  ex¬ 
tended  Kalman  filter  [3,  4,  5].  It  has  been  noted  [13] 
that  the  linearized  motion  models  can  cause  significant 
inconsistencies  in  solutions.  There  have  been  a  few  re¬ 
sults  [14,  15,  16],  which  utilized  nonlinear  system  analy¬ 
sis  and  estimation  tools  to  design  nonlinear  observers  for 
the  problem.  In  recent  work,  Chitrakaran  et  al.  [17,  18] 
proposed  nonlinear  estimation  strategies  to  identify  the 
Euclidean  structure  of  an  object  using  a  monocular  cali¬ 
brated  moving  camera.  The  camera  motion  in  this  work 
was  modeled  based  on  the  homography  between  two  dif¬ 
ferent  views  captured  from  the  camera,  the  current  frame 
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and  a  constant  reference  frame.  The  algorithms  reported 
by  Chitrakaran  et  ah,  require  that  at  least  one  distance 
between  two  features  on  the  object  be  known  for  the  re¬ 
construction  of  the  3D  Euclidean  coordinates.  Also,  to 
decompose  the  homography  and  obtain  the  rotation  and 
translation  of  the  camera  between  the  two  camera  views, 
the  normal  vector  to  the  object  must  be  known  [18]  and 
in  the  case  of  [17],  the  rotation  between  the  object  frame 
and  the  camera  at  the  reference  position  must  also  be 
known. 

In  this  work,  our  objective  is  to  estimate  the  3D  Eu¬ 
clidean  structure  of  a  static  object  using  a  single  camera 
mounted  on  a  moving  platform  whose  translation  and  ro¬ 
tation  velocities  are  measurable.  Although  the  work  in 
[17,  18],  was  fundamentally  more  challenging,  since  the 
camera  velocity  was  unknown,  it  did  make  some  assump¬ 
tions  on  the  structure  of  the  object  which  it  was  to  iden¬ 
tify.  There  are  applications  such  as  video  surveillance 
and  mapping  using  a  UAV  or  a  mobile  robot  where  the 
velocity  of  the  camera  mounted  on  the  moving  platform 
is  readily  available.  Thus,  the  goal  of  this  work  is  to  elim¬ 
inate  the  requirements  from  the  previous  works  [17,  18], 
that  the  distance  between  two  feature  points  be  known, 
and  that  the  normal  vector  or  rotation  matrix  be  known 
a  priori.  The  development  in  this  work  is  similar  to  the 
concepts  introduced  in  [19,  20],  where  range  observer’s 
were  developed  for  feature  points  on  an  object  undergoing 
affine  motion  with  known  motion  parameters.  However, 
the  development  in  our  work  is  based  on  the  kinematics 
of  the  moving  camera  and  has  a  simpler  mathematical 
formulation. 

To  design  the  estimator,  the  equations  for  motion  kine¬ 
matics  are  first  developed  in  terms  of  Euclidean  and 
image-space  information  based  on  a  single  camera  view 
[21].  Then,  a  nonlinear  integral  observer  [22],  is  uti¬ 
lized  to  estimate  the  velocity  of  each  feature  point  in  the 
image  plane.  Once  the  estimate  of  the  image  velocity 
is  known  a  simple  estimator  can  be  developed  for  the 
depth  variable,  and  hence,  the  3D  structure  can  be  esti¬ 
mated.  The  developed  estimator  asymptotically  identifies 
the  Euclidean  depth  subject  to  an  observability  condition. 
This  condition  is  similar  to  the  observability  condition  of 
[19,  20]  and  the  persistency  of  excitation  condition  in  [18]. 
The  proposed  estimator  was  implemented  using  a  camera 
mounted  on  a  mobile  robot  and  our  preliminary  exper¬ 
imental  results  show  that  the  estimator  converges  very 
quickly  and  is  not  computationally  complex,  and  hence, 
can  be  used  for  real-time  applications.  As  an  extension 
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of  this  work,  we  show  how  the  proposed  Euclidean  es¬ 
timation  strategy  can  be  applied  to  a  paracatadioptric 
system. 

The  remainder  of  this  paper  is  organized  as  follows,  in 
Section  2,  the  geometric  model  which  relates  Euclidean 
coordinates  of  visual  features  on  the  stationary  object 
with  their  corresponding  image  pixel  coordinates  is  devel¬ 
oped  based  on  the  perspective  projection  model.  Section 
3,  describes  the  motion  kinematics  between  the  camera 
and  the  object.  Section  4,  describes  the  velocity  estima¬ 
tor  which  is  used  to  estimate  the  pixel  coordinate  veloc¬ 
ity  of  the  visual  features,  and  in  Section  5,  the  Euclidean 
depth  estimator  is  developed.  Finally,  numerical  simula¬ 
tion  and  preliminary  experimental  results  using  a  mobile 
robot  in  an  indoor  environment  are  presented  in  Section  6 
and  Section  7,  respectively.  In  the  appendix  an  extension 
of  this  estimation  strategy  to  a  Paracatadioptric  system 
is  presented. 


2  Geometric  Model 

Figure  1  shows  the  geometric  relationship  between  a  mov¬ 
ing  perspective  camera  and  features  on  a  static  object  in 
its  field  of  view.  The  geometric  model  developed  in  this 
section  is  based  on  a  single  view  of  the  object  from  the 
camera  at  a  time  varying  position  denoted  by  X.  The 
vector  fhi(t)  £  R3  denotes  the  3D  Euclidean  position  of 
the  ith  feature  point  Oi  relative  to  the  camera  frame  X, 
and  is  defined  as 


The  corresponding  projective  pixel  coordinates  of  the  fea¬ 
ture  points  are  denoted  by  pi(t)  £  R3  which  is  defined  as 
follows 

Pi  =  [  Ui  Vi  1  ] 1  .  (3) 

The  image  coordinates  of  the  features  and  their  normal¬ 
ized  Euclidean  coordinates  are  related  by  the  pin-hole 
camera  model  [21]  such  that 


Pi  =  Ami  (4) 

where  A  £  R3x3  is  a  known,  constant,  and  invertible 
intrinsic  camera  calibration  matrix  defined  as  follows  [23] 


A  = 


’  fku 

0 
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fkucot{(j)) 

fkv 


sin{4> ) 

0 


Uo 

Vo 
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(5) 


where  uq,  vo  £  M  denote  the  pixel  coordinates  of  the  prin¬ 
cipal  point  (i.e.,  the  image  center  that  is  defined  as  the 
frame  buffer  coordinates  of  the  intersection  of  the  optical 
axis  with  the  image  plane),  kUlkv  el  represent  camera 
scaling  factors,  <j>  £  R  is  the  angle  between  the  camera 
axes,  and  f  £  R  denotes  the  camera  focal  length. 


3  Camera  Kinematics 

The  kinematics  of  the  camera  frame  X  is  developed  in 
terms  of  the  image  coordinates  of  the  feature  points.  Af¬ 
ter  taking  the  time  derivative  of  (4),  the  following  kine¬ 
matics  can  be  obtained  (see  [24]  for  more  details) 


mi  =  [  Xi  yi 


Cl) 


The  image  coordinates  of  the  feature  points  as  captured 


(Object) 


Figure  1:  Geometric  relationship. 

by  the  moving  camera  are  the  normalized  Euclidean  co¬ 
ordinates  of  the  feature  points,  denoted  by  mi(t)  £  R3 
which  is  defined  as 


(2) 


Pi  =  ~~ AeiVc  +  Aei  [A  1pi]  x  LOc  (6) 

where  vc,uc  £  R3  denote  the  translational  and  rotational 
velocities  of  the  camera  relative  to  the  initial  position 
of  the  camera  but  expressed  in  the  local  frame  I,  and 
Aei  £  R3x3  is  a  function  of  the  intrinsic  camera  calibra¬ 
tion  matrix  and  the  image  coordinates  of  the  ith  feature 
points  image  coordinates,  defined  as  follows 


Aei  —  A  — 


0  0  Ui 
0  0  Vi 
0  0  0 


(7) 


and  [£]  x  denotes  the  following  skew  symmetric  matrix 


[C]x  = 
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0 

-Cl 
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For  the  remainder  of  this  development  only  the  first  two 
elements  of  pi(t)  defined  in  (6)  are  considered.  Thus,  the 
2D  kinematics  for  the  camera  can  be  written  as 

Xi  =  H.iVc  +  Hi  [A~lpi\  x  wc  (9) 


where  Xi  (t)  £  R2  is  expressed  as 

Xi=[Ui  Vl]T  (10) 
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and  II*  €  R2x3  consists  of  the  first  two  rows  of  the  matrix 
Aei  which  was  defined  in  (7),  and  can  be  explicitly  written 
as  follows 


n, 


fku  fkucot(<j>)  u0  -  Ui 

0  -^TT\  Vo  —  Vi 

stn((p)  u  L  J 


(ii) 


4  Image  Feature  Velocity  Estima¬ 
tion 


The  only  unknown  in  the  camera  kinematic  equation  (9), 
is  the  Euclidean  depth  Zi(t).  To  facilitate  the  develop¬ 
ment  of  an  estimator  for  the  depth  parameter,  an  esti¬ 
mate  of  the  image  velocity  signal  Xi(t)  is  required.  The 
following  continuous  estimator  [22]  can  be  utilized  to  es¬ 
timate  the  velocity 

Xi  =  f  (Ki  +  I2)Xi(T)  +Tisgn(Xj('r))  dr 
Jto  1  J 

+  (Ki  +  I2)  Xi{t)  (12) 

where  Xi(t)  =  [  u,  ii  ]  G  R2  denotes  the  estimate 
of  the  signal  Xi(t),  Xi(t)  G  R2  is  the  estimation  error 
defined  as  follows 

Xi(t)±X(t)-X(t),  (13) 

Ki,Ti  G  R2x2  denote  constant  positive  definite  diagonal 
gain  matrices,  and  sgn(X,;)  denotes  the  signurn  function 
applied  to  each  element  of  the  vector  Xi(t).  For  more 
details  on  the  development  of  the  above  estimator  the 
reader  is  referred  to  [22].  To  summarize  the  result,  it  was 
shown  that  the  estimator  in  (12)  asymptotically  identifies 

the  signal  Xi{t)  (i.e.,  Xi(t)  ,  Xi(t)  — >  0  as  t  — >  oo), 

provided  that  the  jth  diagonal  element  of  the  gain  matrix 
Tj  and  the  jth  element  of  the  vectors  Xi(t)  and  Xi(t) 
satisfies  the  following  condition  for  all  i  feature  points 

[Ti]j>  fV(i)l  +  [Xi(t)]  Vj  =  1,2.  (14) 

L  J  j  J 

Thus  the  only  restriction  on  the  camera  motion  is  a  rela¬ 
tively  mild  assumption  of  the  smoothness  and  bounded¬ 
ness  of  the  higher  order  derivatives  of  the  camera  velocity. 


5  Euclidean  Depth  Estimation 

The  objective  is  to  design  an  estimator  for  the  Euclidean 
depth,  Zi{t).  To  this  end,  the  kinematic  equation  (9),  can 
be  rewritten  in  a  simplified  form  as  follows 

Xi  =  -PiXi  +  5i  (15) 

where  A,;  =  [  Xu  Xi2  ]T  G  R2,  Si  =  [  6a  Si2  ]J  G  M2 
are  measurable  signals  which  are  defined  as  follows 

A,;  =  Ilj  vc  (16) 

5i  =  Ilj  [A-1  pi]  xu>c  (17) 


and  pi(t)  =  — G  R  is  the  inverse  of  the  Euclidean 
Zi{t) 

depth  which  is  unknown  and  will  be  estimated. 

The  individual  components  of  the  simplified  expression 
for  the  camera  kinematics  in  (15),  can  be  written  as 

Xa  +  Xa  =  —piXa  +  Sa  (18) 

Xi2  +  Xi2  =  ~PiXi2  +  Si2  (19) 

where  =  [  Xa  Xi2  ]  ,  =  [  Xa  Xi2  ]  ,  and 

(13)  was  utilized.  After  multiplying  (18)  by  Xa(t)  and 
(19)  by  Xl2(t).  and  rearranging  the  resulting  equations, 
the  following  expressions  can  be  obtained 


Pi\  1 

-< 

II 

-  in) 

—  XaXa 

(20) 

Pi^i2 

CM 

C4 

II 

—  x^ 

—  Xi2Xi2. 

(21) 

After  adding  (20)  and  (21),  the  following  expression  is 
obtained 

Pi  (A?i  +  Xl)  =  Xa  (da  -  Vi)  +  Ai2  [si2  -  ii2) 

— XaXa  —  Xi2Xi2.  (22) 


Based  on  the  expression  in  (22),  an  estimate  for  the  in¬ 
verse  Euclidean  depth  can  be  designed  as  follows 


where  pi(t)  G  R  represents  the  inverse  depth  estimate  and 
the  inverse  depth  estimation  error  pi{t)  =  pi(t)—pi(t)  G  R 
is  explicitly  defined  as  follows 

Pi  =  7vi —  \2  \  XaXa  +  Xi2Xi2  .  (24) 

(Aji  +  \2)  L  J 

Notice  that,  since  the  image  feature  velocity  estima¬ 
tor  asymptotically  converges  to  the  true  velocity  (i.e., 

Xa(t),Xi2(t)  — ►  0),  the  inverse  depth  estimation  error 
converges  to  zero,  (i.e.,  pi(t)  — >  0).  Thus,  the  inverse 
depth  estimate  Pi(t).  converges  to  its  true  value  provided 

that,  X(t)  — >  X(t)  and  (A?1(t)  +  A f2(t))  ^  0.  From  (15), 
it  is  evident  that,  if  (A 2x(t)  +  A22(t))  =  0,  then  the  in¬ 
verse  depth  estimate  pi(t)  is  unobservable.  Thus,  we  can 
conclude  that  the  inverse  depth  estimate  can  be  asymp¬ 
totically  identified  provided  that  A 2:(t)  +  A 22(f)  ^  0  and 
the  gain  condition  in  (14)  is  satisfied. 


6  Simulation  Results 

A  simulation  study  was  conducted  to  evaluate  the  perfor¬ 
mance  of  the  proposed  estimation  algorithm.  The  sim¬ 
ulations  were  performed  using  five  static  feature  points 
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whose  Euclidean  coordinates  were  selected  as  follows 


1.2 


Oi  =  [0  0.2  1  ]j 

02  =  [  -o.l  0.2  1.25  ]T 

03  =  [  0.1  0.2  1.5  ]T  (25) 

Oi  =  [  -0.2  0.2  1.75  ]T 

0.5  =  [  0.2  0.2  2  ] T . 

The  cameras  translational  and  rotational  velocities  were 
chosen  as 

vc  =  [  0.2 cos(t)  0.2 sin(t)  0.1sin(t)  ]T  m/s 
ujc  =  [  0  0  0.1sm(0.27rt)  ]T  rad/s  .  (26) 
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In  addition,  a  camera  calibration  matrix  for  a  640  x  480 
camera  was  selected  as  follows 


810  0  320 

0  820  240 

0  0  1 


(27) 


The  estimator  gains  were  chosen  to  give  the  best  perfor¬ 
mance  both  with  and  without  additive  noise  and  were 
selected  as  follows 


Ki  =  diag{  20,20},  Tj  =  diag{  3,3}.  (28) 

In  the  simulations  four  different  cases  were  considered 
using  the  above  parameters.  For  case  1,  the  image  points 
had  no  noise  added  to  them.  In  case  2,  a  small  amount 
of  noise  (variance  0.001)  was  added  to  the  image  points. 
For  case  3,  noise  with  a  variance  of  0.0001  was  added  and 
image  points  were  passed  through  a  low-pass  filter.  The 
low  pass  filter  had  a  cutoff  frequency  of  2  Hz.  In  the 
final  case,  the  image  points  were  rounded  to  integers  to 
simulate  the  discrete  output  of  the  feature  tracker,  these 
image  points  were  then  passed  through  the  low  pass  filter. 

The  simulation  results  for  each  of  the  four  cases  is 
shown  in  Table  1.  Note  that,  feature  points  that  are  fur¬ 
ther  from  the  camera  generally  have  a  larger  error.  The 
highest  percent  error  was  4.3%  for  case  2  with  the  feature 
point  at  a  distance  of  2  m  from  the  camera.  The  depth 
estimation  error  for  the  four  cases  considered  in  the  sim¬ 
ulations  are  shown  in  figures  2,  3,  4,  5.  Figure  6  shows  a 
comparison  of  the  depth  estimation  error  using  the  cur¬ 
rent  algorithm  and  the  algorithm  from  [18]  for  a  single 
feature  point. 


7  Experimental  Results 


In  this  section,  preliminary  experimental  results  using  a 
mobile  robot  are  discussed.  A  standard  off  the  shelf  web¬ 
cam  (Logitech  QuickCam)  was  used  to  capture  images  at 
a  resolution  of  640  x  480  pixels.  The  calibration  matrix 
of  the  camera  was  found  to  be  the  following 


726.6  0  333.3 

0  760.8  226.2 

0  0  1 


Figure  2:  Simulation  case  1:  Depth  estimation  error  with¬ 
out  additive  noise. 


Figure  3:  Simulation  case  2:  Depth  estimation  error  with 
noise  of  variance  0.001. 


Figure  4:  Simulation  case  3:  Depth  estimation  error  with 
noise  of  variance  0.0001  and  filtering. 


(29) 
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Figure  5:  Simulation  case  4:  Depth  estimation  error  with 
integer  rounding. 


Figure  6:  Simulation  comparison  of  the  depth  estimation 
error  for  a  single  feature  point. 


Table  1:  Simulation  errors  in  depth  estimation  for  each 
case _ 


Featur 

3  Depth 

Error 

Error 

Error 

Error 

point 

(m) 

Case 

Case 

Case 

Case 

1 

2 

3 

4 

(cm) 

(cm) 

(cm) 

(cm) 

0 1 

1.0 

1.6 

3.0 

1.5 

1.5 

o2 

1.25 

2.0 

4.1 

2.4 

2.5 

o3 

1.50 

2.2 

5.3 

2.2 

2.6 

o4 

1.75 

2.7 

6.9 

3.9 

4.6 

o5 

2.0 

3.0 

8.5 

3.7 

4.9 

The  camera  was  mounted  on  an  ActivMedia  Robotics  Pi¬ 
oneer  3  mobile  robot  as  shown  in  Figure  7.  The  mobile 
robot’s  on-board  controller  provides  translational  and  ro¬ 
tational  velocity  information  using  wheel  mounted  optical 


encoders.  The  test  scene  consisted  of  a  doll  house.  Both 
the  mobile  robot  and  the  camera  were  connected  to  a  lap¬ 
top  with  an  Intel  Centrino  Duo  2  GHz  processor  and  1 
GB  of  memory.  The  laptop  was  used  to  set  the  velocity  of 
the  robot,  capture  images  of  the  scene,  and  log  the  video 
and  velocity  data  for  off-line  processing.  For  the  prelimi¬ 
nary  tests,  the  robot  was  given  a  translational  velocity  of 
5  cm/s  along  the  X-axis  and  no  rotational  velocity.  The 
average  frame  rate  obtained  using  the  webcam  was  14.2 
frames  per  second. 


'■'\Pioneer 

robot 


Figure  7:  Experimental  test  setup  with  camera,  mobile 
robot,  and  doll-house  scene. 

Using  the  implementation  of  the  Lucas-Kanade  feature 
tracking  algorithm  provided  in  the  OpenCV  computer 
vision  library  [25],  a  computer  program  was  written  in 
C++  which  enabled  the  user  to  select  features  manually 
and  track  those  features  for  the  entire  image  sequence. 
The  program  created  a  text  file  which  contained  the  fea¬ 
ture  point  pixel  coordinates  and  camera  velocity  for  each 
frame.  In  the  experiment,  12  features  were  selected.  A 
sample  frame  with  the  tracked  feature  points  is  shown  in 
Figure  8. 


Figure  8:  A  frame  from  the  doll-house  image  sequence 
showing  the  tracked  feature  points. 
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The  depth  estimation  was  calculated  off-line  using 
Mathworks  Simulink  program.  A  low-pass  filter  with  a 
cutoff  frequency  of  1  Hz  was  applied  to  the  feature  points 
and  camera  velocities  from  the  text  file  to  smooth  the 
data.  The  following  velocity  estimator  gains  provided 
good  performance 

Ki  =  diag{  5,5},  Tj  =  diag{\,  1}.  (30) 

The  estimated  distance  between  features  is  shown  in  Fig¬ 
ure  9.  Note  that  the  estimated  values  stabilize  in  under 
1  second.  The  estimation  error  is  shown  in  Figure  10. 
To  illustrate  how  the  image  velocity  estimator  is  useful, 
the  image  velocity  estimator  was  replaced  with  a  deriva¬ 
tive  operator.  The  distance  estimation  error  was  seen  to 
be  much  higher  without  the  image  velocity  estimator.  In 
fact,  the  estimation  error  more  than  doubled  when  the 
velocity  estimator  was  replaced  with  the  simple  deriva¬ 
tive  operation.  Figure  11  shows  the  estimated  distance 
between  features,  and  Figure  12,  shows  the  estimation 
error  for  the  case  with  the  derivative  operation. 

Table  2:  Experimental  error  in  distance  estimation 


Object 

Actual 

distance 

(cm) 

Estimated 

distance 

(cm) 

Length  I 

10.0 

9.49 

Length  II 

23.7 

23.4 

Length  III 

40.0 

39.6 

Length  IV 

33.7 

33.2 

Length  V 

24.5 

24.2 

Length  VI 

24.5 

24 

Figure  9:  Experiment:  Estimated  distance  between  fea¬ 
tures. 


Figure  10:  Experiment:  Distance  estimation  error. 


Figure  11:  Experiment:  Estimated  distance  between  fea¬ 
tures  with  derivative  operator. 


Figure  12:  Experiment:  Distance  estimation  error  with 
derivative  operator. 


6 


8  Summary  and  Future  Work 

In  this  paper,  we  have  presented  an  estimation  strategy 
for  3D  Euclidean  reconstruction  of  static  features  on  an 
object  using  a  single  moving  camera  whose  velocities  are 
known.  The  proposed  estimator  has  a  simple  mathemat¬ 
ical  structure  and  can  be  easily  implemented.  Numerical 
simulations  and  preliminary  experimental  results  using  a 
mobile  robot  in  an  indoor  environment  were  presented. 
These  results  demonstrate  that  the  estimation  strategy  is 
accurate  and  converges  quickly,  in  under  one  second,  even 
with  a  poor  resolution  and  low  frame  rate  camera.  The 
proposed  estimation  strategy  was  also  shown  to  be  appli¬ 
cable  to  a  paracatadioptric  system.  Our  future  work  will 
consider  more  complex  trajectories  for  the  mobile  robot 
and  utilization  of  a  high  definition  video  camera  which  is 
expected  to  provide  more  accurate  results.  Further  ex¬ 
perimental  validation  using  a  video  camera  mounted  on  a 
UAV  is  also  being  considered.  The  real-time  performance 
of  this  algorithm  is  also  currently  being  evaluated. 


where  we  have  used  the  following  expression  to  describe 
the  motion  of  the  Euclidean  co-ordinate  fh{t)  as  seen  from 
the  moving  camera  frame  [24] 


to  =  —  vc  —  [wc]x  to.  (35) 

In  (35),  vc,loc  £  R3  denote  the  translational  and  rota¬ 
tional  velocity  of  the  camera  frame,  and  [•]  x  was  defined 
in  (8).  In  (34),  fii(t)  £  R3  consists  of  measurable  and 
known  signals  and  g(t)  £  R3  contains  the  unknown  Eu¬ 
clidean  information  which  must  be  estimated.  The  signals 
fii(f),c/(t),  are  defined  as  follows 


fk  =  -  M  (~y20Jci  +  2/iWc2)  y  (36) 


g  =  -rvc 


vC3  yix  +  y2y  +  V3z\  _ 

1 - 2/(!  +  *3)  )  *  <37) 


Similar  to  the  development  in  Section  4,  the  velocity 
estimator  [22]  can  be  used  to  develop  an  estimate  of  the 
signal  y(t)  as  follows 


APPENDIX 

A  Extension  for  a  Paracatadiop¬ 
tric  System 

A  paracatadioptric  system  is  a  central  catadioptric  sys¬ 
tem  with  a  paraboloid  mirror  and  an  orthographic  camera 
[26,  27] .  This  type  of  system  has  a  large  field  of  view  and 
thus  is  suitable  for  structure  from  motion.  In  this  sec¬ 
tion  we  consider  the  Euclidean  reconstruction  of  a  static 
object  using  the  2D  images  captured  from  a  paracatadiop¬ 
tric  system  mounted  on  a  moving  platform  whose  velocity 
is  measurable.  The  development  presented  is  for  a  single 
feature  point. 

The  projection  of  the  Euclidean  coordinate  fh(t)  = 
[  x  y  z  ]T  £  R3  onto  a  paraboloid  mirror  is  denoted 

by  the  coordinate  y{t)  =  [  y\  yi  y%  ]T  £  R3  which  is 
defined  as  follows  [28] 


2/  = 


[( K  +  h)  y(r )  +  Tsgn(y(r))]dT 


\K+h)m 


(38) 


where  K,  T  £  R3x3  denote  constant  positive  definite 
diagonal  gain  matrices,  and  y(t)  =  y(t)  —  y(t)  de¬ 
notes  the  estimation  error  for  the  signal  y(t).  The  es¬ 
timator  in  (38)  asymptotically  identifies  the  signal  y(t) 
(i.e.,  §{t)  ,  ||))(t)||  ->  0  as  t  ->  oo). 

Since  fii(i)  is  measurable,  an  estimate  g(t)  = 

\  9i  92  93  ]T  G  R3  for  the  signal  g(t)  can  be  defined 

as  follows 

g=-Q,x+y  (39) 


where  g(t)  =  g(t )  —  g(t)  denotes  the  estimation  error  for 
the  signal  g{t).  Thus,  the  signal  g(t)  is  asymptotically 
identified  provided  that  y(t)  — *  y.  From  (37),  an  expres¬ 
sion  for  the  unknown  Euclidean  information  contained  in 
r(t)  can  be  developed  as  follows  [28] 


y  =  rm  (31) 

where  r(t)  £  R  is  defined  as  follows 


where  f  £  R  denotes  the  constant  known  focal  length 
of  the  system  and  l(fn)  £  R  is  the  following  nonlinear 

function  _ 

l  =  —  z  +  \J  x1  +  y1  +  z1.  (33) 

Note  that,  since  the  projection  from  the  mirror  onto  the 
camera’s  sensor  is  orthographic  the  signal  y(t)  is  directly 
measurable. 

After  taking  the  time  derivative  of  (31),  the  following 
expression  for  the  kinematics  of  y(t)  can  be  obtained  in  a 
similar  manner  to  [28] 

V  =  +  9  (34) 


(2/231  -  yihf  +  (2/3.91  -  2/133)2 

t>  =  +  fefe  -  vAf -  (40) 

(V2VCl  -  2/1*0  +  (2/3 Uci  -  2/1  Wca  ) 

+  (*/3**c2  -  2/2*02 

where  r(t)  £  R  denotes  an  estimate  for  r(t),  and  vc(t)  = 
[  **ci  **c2  **C3  ]T-  The  expression  in  (40)  is  obtained 

by  considering  the  individual  elements  of  (37)  and  factor¬ 
ing  out  the  second  term.  Thus  the  3D  Euclidean  infor¬ 
mation  can  be  reconstructed  from  (31)  provided  that  the 
denominator  of  (40)  is  non-zero. 


References 

[1]  J.  Oliensis,  “A  critique  of  structure  from  motion  al¬ 
gorithms,”  Computer  Vision  and  Image  Understand¬ 
ing ,  vol.  80,  no.  2,  pp.  172-214,  2000. 


7 


[2]  F.  Chaumette,  S.  Boukir,  P.  Bouthemy,  and  D.  Ju- 
vin,  “Structure  from  controlled  motion,”  IEEE 
Trans.  Pattern  Anal.  Machine  Intel L,  vol.  18,  no.  5, 
pp.  492-504,  May  1996. 

[3]  A.  Chiuso,  P.  Favaro,  H.  Jin,  and  S.  Soatto,  “Struc¬ 
ture  from  motion  causally  integrated  over  time,” 
IEEE  Trans.  Pattern  Anal.  Machine  Intel l.,  vol.  24, 
no.  4,  pp.  523-535,  Apr.  2002. 

[4]  H.  Durrant- Whyte  and  T.  Bailey,  “Simultaneous  lo¬ 
calization  and  mapping:  Part  i,”  IEEE  Robot.  Au¬ 
tomat.  Mag.,  vol.  13,  no.  3,  pp.  99-108,  2006. 

[5]  A.  J.  Davison,  I.  D.  Reid,  N.  D.  Molton,  and 
O.  Stasse,  “Monoslam:  Real-time  single  camera 
slam,”  IEEE  Trans.  Pattern  Anal.  Machine  Intel l., 
vol.  29,  no.  6,  June  2007. 

[6]  T.  Fukao,  K.  Fujitani,  and  T.  Kanade,  “An  au¬ 
tonomous  blimp  for  a  surveillance  system,”  in  Proc. 
IEEE  Int.  Conf.  Intelligent  Robots  and  Systems,  Las 
Vegas,  NV,  2003,  pp.  1820-1825. 

[7]  T.  Kanade,  O.  Arnidi,  and  Q.  Ke,  “Real-time  and  3d 
vision  for  autonomous  small  and  micro  air  vehicles,” 
in  Proc.  IEEE  Int.  Conf.  Decision  and  Control,  Dec. 
2004,  pp.  1655-1662. 

[8]  J.  D.  Redding,  T.  W.  McLain,  R.  W.  Beard,  and 
C.  N.  Taylor,  “Vision-based  target  localization  from 
a  fixed-wing  miniature  air  vehicle,”  in  Proc.  Amer¬ 
ican  Control  Conf.,  Minneapolis,  MN,  2006,  pp. 
2862-2867. 

[9]  V.  N.  Dobrokhodov,  I.  I.  Kaminer,  K.  D.  Jones,  and 
R.  Ghabcheloo,  “Vision-based  tracking  and  motion 
estimation  for  moving  targets  using  small  uavs,”  in 
Proc.  American  Control  Conf.,  Minneapolis,  MN, 
2006,  pp.  1428-1433. 

[10]  J.-H.  Kim  and  S.  Sukkarieh,  “Airborne  simultaneous 
localisation  and  map  building,”  in  Proc.  IEEE  Int. 
Conf.  Robots  and  Automation,  Taipei,  Taiwan,  2003, 
pp.  406-411. 

[11]  I.-K.  Jung  and  S.  Lacroix,  “High  resolution  terrain 
mapping  using  low  altitude  aerial  stereo  imagery,” 
in  Proc.  IEEE  Int.  Conf.  Computer  Vision,  Nice, 
France,  2003,  pp.  1820-1825. 

[12]  I.  Miyagawa  and  K.  Arakawa,  “Motion  and  shape 
recovery  based  on  iterative  stabilization  for  modest 
deviation  from  planar  motion,”  IEEE  Trans.  Pattern 
Anal.  Machine  Intel l.,  vol.  28,  no.  7,  pp.  1176-1181, 
July  2006. 

[13]  S.  J.  Julier  and  J.  K.  Uhlmann,  “A  counter  example 
to  the  theory  of  simultaneous  localization  and  map 
building,”  in  Proc.  IEEE  Int.  Conf.  Robots  and  Au¬ 
tomation,  Seoul,  Korea,  2001,  pp.  4238-4243. 


[14]  M.  Jankovic  and  B.  K.  Ghosh,  “Visually  guided  rang¬ 
ing  from  observations  of  points,  lines  and  curves  via 
an  identifier  based  nonlinear  observer,”  Systems  and 
Control  Letters,  vol.  25,  pp.  63-73,  1995. 

[15]  X.  Chen  and  H.  Kano,  “State  observer  for  a  class  of 
nonlinear  systems  and  its  application  to  machine  vi¬ 
sion,”  IEEE  Trans.  Automat.  Contr.,  vol.  49,  no.  11, 
pp.  2085-2091,  Nov.  2004. 

[16]  X.  Hu  and  T.  Ersson,  “Active  state  estimation  of 
nonlinear  systems,”  Automatica,  vol.  40,  p.  2075 
2082,  2004. 

[17]  V.  K.  Chitrakaran,  D.  M.  Dawson,  J.  Chen,  and 
H.  Kannan,  “Velocity  and  structure  estimation  of  a 
moving  object  using  a  monocular  camera,”  in  Proc. 
American  Control  Conf.,  Minneapolis,  MN,  2006, 
pp.  5159-5164. 

[18]  V.  K.  Chitrakaran  and  D.  M.  Dawson,  “A  lyapunov- 
based  method  for  estimation  of  euclidean  position 
of  static  features  using  a  single  camera,”  in  Proc. 
American  Control  Conf.,  New  York,  NY,  2007,  to 
appear. 

[19]  W.  E.  Dixon,  Y.  Fang,  D.  M.  Dawson,  and  T.  J. 
Flynn,  “Range  identification  for  perspective  vision 
systems,”  IEEE  Trans.  Automat.  Contr.,  vol.  48, 
no.  12,  pp.  2232-2238,  2003. 

[20]  D.  Karagiannis  and  A.  Astolfi,  “A  new  solution 
to  the  problem  of  range  identification  in  perspec¬ 
tive  vision  systems,”  IEEE  Trans.  Automat.  Contr., 
vol.  50,  no.  12,  pp.  2074-2077,  2005. 

[21]  O.  Faugeras,  Three-Dimensional  Computer  Vision. 
Cambridge,  Massachusetts:  MIT  Press,  1993. 

[22]  V.  Chitrakaran,  D.  M.  Dawson,  W.  E.  Dixon,  and 
J.  Chen,  “Identification  of  a  moving  objects  velocity 
with  a  fixed  camera,”  Automatica,  vol.  41,  no.  3,  pp. 
553-562,  Mar.  2005. 

[23]  E.  Malis  and  F.  Chaumette,  “2  1/2  d  visual  servoing 
with  respect  to  unknown  objects  through  a  new  esti¬ 
mation  scheme  of  camera  displacement,”  Int.  Jour¬ 
nal  of  Computer  Vision,  vol.  37,  no.  1,  pp.  79-97, 
2000. 

[24]  J.  Chen,  D.  M.  Dawson,  W.  E.  Dixon,  and  A.  Behai, 
“Adaptive  homography-based  visual  servo  tracking 
for  a  fixed  camera  configuration  with  a  camera-in¬ 
hand  extension,”  IEEE  Trans.  Contr.  Syst.  Technol., 
vol.  13,  no.  5,  pp.  814-825,  Sept.  2005. 

[25]  Open  source  computer  vi¬ 
sion  library.  [Online].  Available: 

http:  / /www. intel.com/technology/computing/opencv/ 

[26]  S.  Baker  and  S.  K.  Nayar,  “A  theory  of  single¬ 
viewpoint  catadioptric  image  formation,”  Int.  Jour¬ 
nal  of  Computer  Vision,  vol.  35,  no.  2,  pp.  175-196, 
1999. 


[27]  C.  Geyer  and  K.  Daniilidis,  “Paracatadioptric  cam¬ 
era  calibration,”  IEEE  Trans.  Pattern  Anal.  Ma¬ 
chine  Intel l.,  vol.  24,  no.  5,  pp.  687-695,  May  2002. 

[28]  D.  Aiken,  S.  Gupta,  G.  Hu,  and  W.  E.  Dixon, 
“Lyapunov-based  range  identification  for  a  paracata¬ 
dioptric  system,”  in  Proc.  IEEE  Int.  Conf.  Decision 
and  Control ,  San  Diego,  CA,  2006,  pp.  3879-3884. 


9 


